Tree Transformations in Data-Driven Dependency Parsing
نویسنده
چکیده
Tesnière (1959) is often regarded as the founder of the modern theoretical tradition of dependency grammar, but it is not one well-defined theory. Instead, it comprises several related theories, having some core notions in common. Such a notion is the directed relations between pairs of lexical elements, but the formalism and the criteria for this can vary. The primary goal of this study is to investigate different criteria for imposing these relations in a sound way with respect to parsing accuracy. The parser used here is a data-driven dependency parser. The investigated issue here is what kinds of dependency structures are easier to learn and construct for a machine learning method. Memory-based (or instance-based) learning is the machine learning method in focus. In other words, how can the parsing accuracy be increased by making changes in the annotation by means of tree transformations, without losing and distorting information. So what criteria can motivate the directed lexical relations in a dependency based theory? Given that parsing is usually just an intermediate step, the basic question asked is: shall the relations be based on syntactic or semantic criteria? A motivation for using semantic criteria is that they might result in an less complicated extraction semantic information. In several situations more than one solution can be said to fulfill a criterion of being a syntactic relation, and in other situations syntactic and semantic relations may coincide. Issues like these are taken into consideration here. In order to do data-driven, robust dependency based parsing for Swedish, a dependency based treebank containing Swedish sentences is needed. Dependency based treebanks have been developed for several languages, e.g. Prague Dependency Treebank (PDT) (Hajič et al. 2001). The Swedish treebank Talbanken (Einarsson 1976), created in the 70s, has been reconstructed so that different dependency based versions can be extracted (Nilsson et al. 2005). Talbanken, refined using associated reconstruction and tree transformation programs, thus constitutes a suitable data resource for this study.
منابع مشابه
تأثیر ساختواژهها در تجزیه وابستگی زبان فارسی
Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...
متن کاملGeneralizing Tree Transformations for Inductive Dependency Parsing
Previous studies in data-driven dependency parsing have shown that tree transformations can improve parsing accuracy for specific parsers and data sets. We investigate to what extent this can be generalized across languages/treebanks and parsers, focusing on pseudo-projective parsing, as a way of capturing non-projective dependencies, and transformations used to facilitate parsing of coordinate...
متن کاملGraph Transformations in Data-Driven Dependency Parsing
Transforming syntactic representations in order to improve parsing accuracy has been exploited successfully in statistical parsing systems using constituency-based representations. In this paper, we show that similar transformations can give substantial improvements also in data-driven dependency parsing. Experiments on the Prague Dependency Treebank show that systematic transformations of coor...
متن کاملDoes Universal Dependencies need a parsing representation? An investigation of English
This paper investigates the potential of defining a parsing representation for English data in Universal Dependencies, a crosslingual dependency scheme. We investigate structural transformations that change the choices of headedness in the dependency tree. The transformations make auxiliaries, copulas, subordinating conjunctions and prepositions heads, while in UD they are dependents of a lexic...
متن کاملTransition-Based Parsing for Deep Dependency Structures
Derivations under different grammar formalisms allow extraction of various dependency structures. Particularly, bilexical deep dependency structures beyond surface tree representation can be derived from linguistic analysis grounded by CCG, LFG, and HPSG. Traditionally, these dependency structures are obtained as a by-product of grammar-guided parsers. In this article, we study the alternative ...
متن کامل